perf: Implement groups accumulator count distinct primitive types#21561
perf: Implement groups accumulator count distinct primitive types#21561coderfender wants to merge 2 commits intoapache:mainfrom
Conversation
|
Requesting to run benchmarks TPCH , TPCDS , clickbench etc to see how groups accumulator impl performs in count(distinct) use case |
|
run benchmark tpch tpcds clickbench_partitiomed clickbemch_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (f0b2a4a) to 16e578d (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
1 similar comment
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
5692893 to
ee0c865
Compare
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
@Dandandan could you retrigger the benchmarks again please? |
|
run benchmark tpch tpcds clickbench_partitioned clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (ee0c865) to 16e578d (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
5b7f593 to
067a6b6
Compare
|
Comparison with main |
|
run benchmark tpch tpcds clickbench_partitioned clickbench_extended |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (8d60834) to a13c23d (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (8d60834) to a13c23d (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
| if args.is_distinct { | ||
| let data_type = args.expr_fields[0].data_type(); | ||
| return match data_type { | ||
| DataType::Int8 => Ok(Box::new( |
There was a problem hiding this comment.
Perhaps we can avoid using a hash set as well here like https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+hashset ?
There was a problem hiding this comment.
Sorry that link routes me to all open PRs using hashset . Not sure which one would you like me to refer to @Dandandan ?
There was a problem hiding this comment.
Yup ! thats the plan once we fix on the right group accumulator strategy
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (8d60834) to a13c23d (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (8d60834) to a13c23d (merge-base) diff using: tpcds File an issue against this benchmark runner |
9daa273 to
d2c5830
Compare
|
Seems like merge history is messed up when I cherry picked a commit . I will clean this up to have 2 commits 1.hashtable approach and the current one : Vector of group , count tuples to measure performance |
add count distinct group benchmarks add count distinct group benchmarks count group benchmark check count group benchmark check init implement_group_accumulators_count_distinct implement_group_accumulators_count_distinct implement_group_accumulators_count_distinct implement_group_accumulators_count_distinct implement_group_accumulators_count_distinct_use_hashtable implement_group_accumulators_count_distinct_use_hashtable add group benches Use same benchmark names for comparison count group benchmark check count group benchmark check
expl_tuple_vec_on_demand_sort
219bc4f to
28c607a
Compare
|
@Dandandan , I am trying to a Vector of pairs (group ID , value) approach to see if SIMD (sort and return group counts only during Could you please trigger CI benchmarks to see if this is adding value over hashtable approach ? |
|
run benchmark tpch tpcds clickbench_partitioned clickbench_extended |
It seems to me the problem would be memory use (whenever it is non-unique)? |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (28c607a) to 2818abb (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (28c607a) to 2818abb (merge-base) diff using: clickbench_extended File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (28c607a) to 2818abb (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing implement_groups_accumulator_count_distinct_primitive_types (28c607a) to 2818abb (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
File an issue against this benchmark runner |
|
@Dandandan , there is a consistently higher memory usage with no proportional speedups.The current approach packs group and index into a single
Yup ! It seems like the memory spikes are not worth the minor speed ups (perhaps even noise) |
|
I am trying an approach with storing counts per group along with hashtable to see if we could speedup |
Which issue does this PR close?
Evaluate perf with group accumulators for count distinct
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?